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COMPUTER-AIDED  DISCOVERY  OF  A 
FAST  MATRIX-MULTIPLICATION  ALGORITHM 


I.  INTRODUCTION 


Multiplying  two  n-by-n  matrices  by  straightforward  evaluation  of 
the  usual  definition, 


Zik 


n 


V X..Y. 
Z-i  ij  jk 

3-1 


y 


involves  multiplying  n^  pairs  of  numbers  and  performing  a 

proportionate  number  of  other  elementary  operations,  such  as  additions 

the  total  number  of  operations  is  O(n^)  as  n increases.  A celebrated 
algorithm  of  Strassen's  [1]  requires  only  0(na)  operations,  where  the 
exponent  a is  log27,  or  about  2.807.  Strassen's  is  one  of  a class  of 

similar  algorithms.  Each  algorithm  of  the  class  is  based  on  a method 

for  reducing  the  problem  of  multiplying  two  n-by-n  matrices  to  that  of 
multiplying  M pairs  of  rn/N~'-by-rn/N  matrices ,* where  n is  arbitrary 
and  M and  N are  fixed  integers  characteristic  of  the  algorithm.  The 
total  number  of  operations  used  by  the  algorithm  is  0(na),  where 
a = logflM  (provided  logijM  > 2). 

For  Strassen,  N “ 2 and  M = 7.  Winograd  [2]  has  shown  that  when 
N = 2,  the  best  attainable  M is  7. 

An  algorithm  due  to  Laderman  [3]  has  N * 3 and  M = 23.  With  N ■ 3 
it  is  an  open  question  whether  smaller  values  of  M are  attainable; 
improvement  over  the  best  known  value  of  a would  require  M £ 21.  When 
N * 4,  Strassen's  algorithm  achieves  M ■ 49,  with  M < 48  needed  for 
improvement.  With  N = 5,  Schachtel  [4]  has  given  an  algorithm  with 
M ■ 103,  and  M £ 89  is  needed  for  improvement  over  known  results. 
Strassen '8  result  has  so  far  been  surpassed  only  by  Pan  [5],  who 
recently  described  a family  of  algorithms  one  of  which  has  N * 70  and 
M = 143640.  The  corresponding  exponent  a is  log7Ql43640,  or  about 
2.795. 


♦The  upper  half-brackets  denote  the  “ceiling  function”  — the  least  integer  not  less  than  n/N. 
Note:  Manuscript  submitted  March  12, 1979. 
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The  algorithms  mentioned  appear  to  be  products  of  unaided  human 
ingenuity;  Laderman,  at  least,  explicitly  denies  having  used  a computer 
in  obtaining  his  result.  We  report  here  some  results  of  using  a 
computer  to  search  for  such  matrix-multiplication  schemes.  We  wrote  a 
short  APL  version  of  the  proposed  search  procedure  to  gain  some 
experience  before  deciding  whether  to  devote  substantial  effort  to 
writing  a more  efficient  version;  we  set  ourselves  the  goal  of 
reproducing  the  known  results  for  N * 2 and  3.  The  search  with  N * 2 
and  M = 7 was  successful;  the  algorithm  discovered  is  equivalent  to 
Strassen's  in  a sense  that  will  be  made  explicit  further  down.  The 
search  with  N * 3 and  M * 23  neither  failed  ncr  rediscovered  Laderman's 
algorithm;  it  turned  up  an  algorithm  that,  in  the  sense  mentioned,  is 
inequivalent  to  Laderman’s.  This  algorithm  lacks  certain  desirable 
properties  that  Laderman's  has,  but  is  presented  here  for  the  sake  of 
any  clues  it  may  offer  to  the  structure  of  the  class  of  algorithms  it 
belongs  to.  We  have  not  yet  improved  on  previously  known  values  of  N 
and  M. 

In  the  next  section,  partly  to  establish  some  notation,  we  give  a 
brief  background  discussion  of  the  form  of  the  algorithms  we  are 
considering.  In  the  third  section  we  make  explicit,  as  promised,  a 
notion  of  equivalence  of  two  such  algorithms.  In  the  fourth  section, 
we  describe  the  search  procedure,  and  in  the  fifth  we  present  the 
algorithm  discovered. 


II.  FORM  OF  THE  ALGORITHMS 


Each  of  the  algorithms  uses  a scheme  for  multiplying  N-by-N 
matrices  that  is  of  the  form 


(1) 


Z 

nm 


M 

£ 

r*l 


c(r) 

mn 


9 


where  A^r^,  B^r^,  and  are  fixed  N-by-N  matrices  of  real 

numbers.  Such  a scheme  does  not  depend  on  commutativity  of  the 
elements  Xjj  and  Yjq  of  the  matrices  being  multiplied — it  works 
even  when  Xj:  and  Y^j  belong  to  some  noncommutative  algebra  over 
the  real  numbers.  In  particular,  Xf j and  Yjd  may  be  matrices:  if 
X and  Y are  n-by-n  matrices  of  real  numbers,  and  n is  a multiple  of  N, 
then  we  may,  by  partitioning,  regard  X and  Y as  N-by-N  matrices  whose 
elements  X^j  and  Y;q  are  (n/N)-by-(n/N)  matrices.  If  n is  not 
originally  a multiple  of  N,  we  may  pad  X and  Y with  rows  and  columns  of 
zeros  until  their  size  becomes  a multiple  of  N.  In  any  case,  (1)  gives 
us  a method  for  computing  the  product  Z of  X and  Y by  multiplying  M 
pairs  of  smaller  matrices,  of  the  size  of  Xjj  and  Y^.  We  compute 
each  of  the  products  of  smaller  matrices  by  applying  the  same  method 
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recursively;  ultimately  the  problem  reduces  to  one  of  multiplying 
1-by-l  matrices. 

Besides  the  M multiplications  of  pairs  of  ("smaller")  matrices,  (1) 

involves  several  multiplications  of  matrices  by  scalar  coefficients 

A..  , B,,  , and  C r . For  Strassen's,  Laderman's,  and  Schachtel's 
kl  mn 

algorithms,  but  not  for  the  one  we  will  present  here,  the  scalar 
coefficients  are  all  either  0,  +1,  or  -1,  and  the  corresponding 
multiplications  consequently  become  trivial.  This  simple  form  for  the 
coefficients  reduces  the  cost  of  an  algorithm  by  a considerable 
constant  factor  and  is  therefore  important  practically;  however,  the 
asymptotic  exponent  a is  not  affected:  in  the  bound  0(na)  on  the 
cost  of  the  algorithm,  we  still  have  a ■ logjjM  whether  the 
coefficients  are  0's,  l's,  and  -l's  or  are  arbitrary  floating-point 
numbers . 


III.  EQUIVALENCE 


A necessary  and  sufficient  condition  for  (1)  to  define  Z as  the 
matrix  product  of  X and  Y,  as  opposed  to  some  other  bilinear  function, 
is  that 


(2) 


a^YtV0 

i j kl  mn 


i .6.-6 
ni  jk  lm 


A number  of  simple  transformations  on  the  families  A,  B,  and  C of 
coefficients  carry  solutions  of  (2)  into  other  solutions  of  (2).  Such 
transformations  may  be  considered  as  elementary  equivalences  between 
the  matrix-product  algorithms  corresponding  to  the  families  of 
coefficients.  Two  of  the  simplest  are  the  replacement 


(3) 


A(rt. 


,<r> 


. A(r'\ 


,<r'> 


,(t') 


for  some  permutation  r ■+  r'  of  the  indices  1,...,M,  and  cyclic 
permutation  of  A,  B,  C: 


(4) 


A,  B,  C 


C,  A,  B 


A third  such  trans format 
the  order  of  A 


rans format  ion  is  transposition  together  with  reversal  of 
, B,  C (we  write  A^r'  for  the  transpose  of  A 'r'): 


(5) 


A(r),  B(r),  C(r) 


~(r)  s(r)  ~(r) 


b 


A fourth  is  to  choose  real  numbers  ar,  br,  and  cr  such  that 
arbrcr  = 1 for  r = and  to  map 


(6) 


*w, 


S(r\  c<t)  » a*(r),  H(,),  « c'r) 

> r r r 


The  fifth  and  last  such  transformation  we  will  list  is  to  choose  three 
nonsingular  N-by-N  matrices  P,  Q,  and  R and  make  the  replacement 


(7) 


A(r), 


B 


(r)  „(r) 


QA(r)R_1, 


RB 


(r)„-l 


PC(r )Q_1 


We  will  call  two  solutions  of  (2),  or  the  corresponding  algorithms, 
equivalent  if  one  can  be  turned  into  the  other  by  a combination  of 
transformations  of  the  types  (3) — (7). 

To  illustrate  the  fifth  type  of  transformation,  we  display  the 
coefficients  of  Strassen's  original  algorithm  [1]  (Table  1)  and  those 
of  a version  due  to  Winograd  [6],  which  uses  the  same  number  of 
multiplications  but  fewer  additions  (Table  2).  Strassen's  algorithm  is 


Table  1 

Coefficients  for 
Strassen's  Algorithm 


r 

^7 

r) 

B 

r) 

7) 

i 

1 

0 

1 

0 

1 

0 

0 

1 

0 

1 

0 

1 

0 

0 

1 

0 

0 

1 

z 

1 

l 

0 

0 

0 

-1 

o 

1 

0 

0 

1 

0 

0 
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0 

0 

0 

-1 

1 

1 

A 

0 

0 

-1 

0 

1 

1 

4 

0 

1 

1 

0 

0 

0 

1 

1 

0 

0 

-1 

0 

J 

0 

0 

0 

1 

1 

0 

6 

-1 

0 

1 

1 

0 

0 
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0 

0 

0 

0 

1 

0 

1 

0 

0 

1 

0 

/ 
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-1 

1 

1 

0 
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Table  2 

Coefficients  for 
Winograd' s Algorithm 
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transformed  by  (7)  into  Winograd's  if  we  set 


>-[?  !].  o-  [i  ?].  ■-  ['!  !]  • 

The  two  algorithms  are  thus  equivalent  in  the  sense  we  have  defined. 

IV.  SEARCH  PROCEDURE 


which  is  a nonnegative  function  of  the  A's,  the  B's,  and  the  C's.  We 
sought  solutions  of  (2)  by  trying  to  minimise  (8).  Although  (8)  is  a 
sixth-degree  polynomial,  it  is  only  quadratic  in  the  A's  when  the  B's 
and  C's  are  held  fixed;  likewise  it  is  quadratic  as  a function  of  the 
B ' 8 alone  or  of  the  C's  alone.  The  APL  program  minimizes  (8)  with 
respect  to  the  C's  while  holding  the  A's  and  B's  fixed,  then  minimizes 
with  respect  to  the  B's  with  fixed  A's  and  C's,  and  continues  thus 
cyclically.  The  reason  for  so  constructing  the  program  was  mainly 
programming  convenience.  One  of  the  APL  primitive  functions,  written 
as  ffi,  produces  solutions  to  sets  of  linear  equations,  including 
least-squares  solutions  to  overdetermined  sets.  It  is  quite 
straightforward  to  express  in  terms  of  this  function  the  solution  to 
quadratic  minimization  problems  such  as  minimizing  (8)  with  respect  to 
the  A's.  In  addition  to  the  cyclic  program  just  described,  a simple 
straight-line  search  program  was  written.  The  two  programs  used  in 
alternation  frequently  proved  to  be  more  effective  than  either  used 
alone . 

One  disadvantage  to  seeking  solutions  of  (2)  by  minimizing  (8)  is 
that  negative  results  are  inconclusive:  if  the  computation  happens  to 
converge  to  a nonzero  local  minimum  of  (8),  that  is  no  proof  that  (8) 
does  not  have  a zero  elsewhere.  Another  difficulty  was  more 
troublesome  in  practice  than  nonzero  local  minima:  "zeros  at 
infinity."  It  is  possible  for  certain  of  the  A's,  B's,  and  C's  to  tend 
to  infinity  in  such  a way  that  (8)  tends  to  zero.  This  difficulty  was 
countered  with  a modification  of  the  expression  the  programs  were 
attempting  to  minimize;  a term 


•E 


+ (B^)2  + (cf^) 

ij  ij 
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was  added  to  (8).  The  coefficient  c was  adjusted  by  trial  and  error, 
interactively,  so  that,  if  possible,  the  magnitudes  of  the  A's,  B's, 
and  C's  would  stay  bounded  or  decrease  at  the  same  time  that  the  value 
of  (8)  was  decreasing.  If  a suitable  value  for  e could  not  be  found, 
new  random  starting  values  were  chosen  for  the  A's,  B's,  and  C's,  and 
the  search  was  begun  again. 

The  procedure  just  described  is  unlikely  to  lead  to  a solution  of 
(2)  in  small  integers,  even  if  one  exists;  with  any  integer  solution, 
transformations  (6)  and  (7)  associate  a whole  family  of  equivalent 
solutions,  most  of  which  do  not  consist  of  integers.  Functions  for 
performing  transformations  of  the  forms  (6)  and  (7)  were  written.  When 
the  minimization  procedure  appeared  to  be  converging  to  a zero  of  (8), 
these  functions  were  used  in  an  attempt  to  assure  that  the  solution 
would  be  expressible  in  a simple  form — if  possible,  in  terms  of  l's, 

0 ' s , and  -l's. 


V.  THE  NEW  ALGORITHM 


The  solution  we  obtained,  after  simplification,  is  shown  in  Table 
3.  We  have  not  succeeded  in  transforming  the  solution  to  a form 
consisting  entirely  of  small  integers:  there  remain  several  rational 
numbers  with  2's  and  3's  in  their  numerators  and  denominators.  In  this 
respect,  and  in  general  lack  of  symmetry,  this  solution  compares 
distinctly  unfavorably  with  the  coefficients  of  Laderman's  algorithm, 
which  are  given  in  Table  4.  The  algorithm  resulting  from  the  new 
solution  does,  however,  have  the  same  exponent  a = log323  as 
Laderman's,  and  it  is  provably  inequivalent  to  Laderman's. 

To  prove  inequivalence,  we  point  out  that,  except  for  permutations, 
the  transformations  (3) — (7)  leave  the  ranks  of  the  matrices  A'T', 

B^r ^ , and  C'r'  unchanged.  All  the  matrices  in  Table  3 have  rank  1 
or  2.  But  six  of  the  matrices  in  Table  4 (A'^,  for  instance)  have 
rank  3.  Therefore,  no  combination  of  transformations  (3) — (7)  can 
change  the  solution  in  Table  4 into  that  in  Table  3.  That  is,  the  two 
algorithms  with  coefficients  in  Tables  3 and  4 are  inequivalent  in  the 
sense  we  have  defined. 
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Table  3 (continued) — Coefficients  for  New  Algorithm 
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Table  4 — Coefficients  f^r  Laderman's  Algorithm 


1 1 1 

1 -1-1  0 

0 -1  -1 

1 0 0 

2-100 
0 0 0 

0 0 0 

3 0 10 

0 0 0 

-10  0 

4 110 

0 0 0 

0 0 0 

5 110 

0 0 0 

1 0 0 

6 0 0 0 

0 0 0 

-10  0 

7 0 0 0 

1 1 0 

-10  0 

8 0 0 0 

1 0 0 

0 0 0 

9 0 0 0 

1 1 0 

1 1 1 

10  0 -1  -1 

-1  -1  0 

0 0 0 

11  0 0 0 

0 1 0 

0 0-1 
12  0 0 0 

0 1 1 


0 0 0 

0 1 0 

0 0 0 

0-10 
0 1 0 

0 0 0 

-1  1 0 

1 -1  -1 
-1  0 1 

1 -1  0 

0 1 0 

0 0 0 

-1  1 0 

0 0 0 

0 0 0 

1 0 0 

0 0 0 

0 0 0 

1 0 -1 
0 0 1 

0 0 0 

0 0 1 

0 0-1 
0 0 0 

-1  0 1 

0 0 0 

0 0 0 

0 0 0 

0 0 1 

0 0 0 

-1  0 1 

1 -1  -1 
-1  1 0 

0 0 0 

0 1 0 

1 -1  0 


0 0 0 
1 0 0 
0 0 0 

0 1 0 
0 1 0 
0 0 0 

0 1 0 
0 0 0 
0 0 0 

0 1 0 
1 1 0 
0 0 0 

0 0 0 
1 1 0 
0 0 0 

1 1 1 
1 1 0 
1 0 1 

0 0 1 
0 0 0 
1 0 1 

0 0 1 
0 0 0 
0 0 1 

0 0 0 
0 0 0 
1 0 1 

0 0 0 
0 0 0 
1 0 0 

0 0 1 
0 0 0 
0 0 0 

0 0 1 
1 0 1 
0 0 0 


0 0 1 

13  0 0 0 

0 0-1 

0 0 1 

14  0 0 0 

0 0 0 

0 0 0 

15  0 0 0 

0 1 1 

0 0-1 

16  0 1 1 

0 0 0 

0 0 1 

17  00-1 

0 0 0 

0 0 0 

18  0 1 1 

0 0 0 

0 1 0 

19  0 0 0 

0 0 0 

0 0 0 

20  0 0 1 

0 0 0 

0 0 0 

21  10  0 

0 0 0 

0 0 0 

22  0 0 0 

1 0 0 

0 0 0 

23  0 0 0 

0 0 1 


0 0 0 

0 1 0 

0-10 

0 0 0 

0 0 0 

1 0 0 

0 0 0 

0 0 0 

-1  1 0 

0 0 0 

0 0 1 

1 0 -1 

0 0 0 

0 0 1 

0 0-1 

0 0 0 

0 0 0 

-1  0 1 

0 0 0 

1 0 0 

0 0 0 

0 0 0 

0 0 0 

0 1 0 

0 0 1 

0 0 0 

0 0 0 

0 1 0 

0 0 0 

0 0 0 

0 0 0 

0 0 0 

0 0 1 


0 0 1 
0 0 1 
0 0 0 

1 1 1 
1 0 1 
1 1 0 

0 0 0 
1 0 1 
0 0 0 

0 1 0 
0 0 0 
1 1 0 

0 1 0 
0 0 0 
0 1 0 

0 0 0 
0 0 0 
1 1 0 

1 0 0 
0 0 0 
0 0 0 

0 0 0 
0 1 0 
0 0 0 

0 0 0 
0 0 0 
0 1 0 

0 0 0 
0 0 1 
0 0 0 

0 0 0 
0 0 0 
0 0 1 
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